mb 0
- North America > United States > New Jersey > Bergen County > Hackensack (0.04)
- Europe > Netherlands (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
- Europe > France (0.04)
Hilbert-Guided Block-Sparse Local Attention
The quadratic compute and memory costs of global self-attention severely limit its use in high-resolution images. Local attention reduces complexity by restricting attention to neighborhoods. Block-sparse kernels can further improve the efficiency of local attention, but conventional local attention patterns often fail to deliver significant speedups because tokens within a window are not contiguous in the 1D sequence. This work proposes a novel method for constructing windows and neighborhoods based on the Hilbert curve. Image tokens are first reordered along a Hilbert curve, and windows and neighborhoods are then formed on the reordered 1D sequence. From a block-sparse perspective, this strategy significantly increases block sparsity and can be combined with existing block-sparse kernels to improve the efficiency of 2D local attention. Experiments show that the proposed Hilbert Window Attention and Hilbert Slide Attention can accelerate window attention and slide attention by about 4 and 18, respectively. To assess practicality, the strategy is instantiated as the Hilbert Window Transformer and the Hilbert Neighborhood Transformer, both of which achieve end-to-end speedups with minimal accuracy loss. Overall, combining Hilbert-guided local attention with block-sparse kernels offers a general and practical approach to enhancing the efficiency of 2D local attention for images. The code is available at https://github.com/Y
The Gyro-Structure of Some Matrix Manifolds
This supplemental material provides the proofs for the Theorems and Lemmas presented in our paper. For all the datasets, we use interpolation to create sequences of the same length. For SPDNet and SPDNetBN, we compute a covariance matrix to represent an input sequence as in [20]. Our networks are implemented with Tensorflow framework. The number of frames in each sequence is set to 100.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France (0.04)
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
- North America > United States > New Jersey > Bergen County > Hackensack (0.04)
- Europe > Netherlands (0.04)
- Europe > Hungary > Hajdú-Bihar County > Debrecen (0.04)
- Europe > France (0.04)
From Static to Dynamic: A Streaming RAG Approach to Real-time Knowledge Base
Dynamic streams from news feeds, social media, sensor networks, and financial markets challenge static RAG frameworks. Full-scale indices incur high memory costs; periodic rebuilds introduce latency that undermines data freshness; naive sampling sacrifices semantic coverage. We present Streaming RAG, a unified pipeline that combines multi-vector cosine screening, mini-batch clustering, and a counter-based heavy-hitter filter to maintain a compact prototype set. We further prove an approximation bound \$E\[R(K\_t)] \ge R^\* - L Δ\$ linking retrieval quality to clustering variance. An incremental index upsert mechanism refreshes prototypes without interrupting queries. Experiments on eight real-time streams show statistically significant gains in Recall\@10 (up to 3 points, p < 0.01), end-to-end latency below 15 ms, and throughput above 900 documents per second under a 150 MB budget. Hyperparameter sensitivity analysis over cluster count, admission probability, relevance threshold, and counter capacity validates default settings. In open-domain question answering with GPT-3.5 Turbo, we record 3.2-point gain in Exact Match and 2.8-point gain in F1 on SQuAD; abstractive summarization yields ROUGE-L improvements. Streaming RAG establishes a new Pareto frontier for retrieval augmentation.
- North America > United States > Florida > Pinellas County > St. Petersburg (0.04)
- Asia > China > Liaoning Province > Dalian (0.04)
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- North America > United States > Massachusetts > Middlesex County > Burlington (0.04)
- Banking & Finance (0.70)
- Media > News (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
- (2 more...)
Fast Visuomotor Policies via Partial Denoising
Chen, Haojun, Liu, Minghao, Ma, Xiaojian, Ma, Zailin, Wu, Huimin, Ma, Chengdong, Chen, Yuanpei, Zhong, Yifan, Wang, Mingzhi, Li, Qing, Yang, Yaodong
Diffusion policies are widely adopted in complex visuomotor tasks for their ability to capture multimodal action distributions. However, the multiple sampling steps required for action generation significantly harm real-time inference efficiency, which limits their applicability in long-horizon tasks and real-time decision-making scenarios. Existing acceleration techniques reduce sampling steps by approximating the original denoising process but inevitably introduce unacceptable performance loss. Here we propose Falcon, which mitigates this trade-off and achieves further acceleration. The core insight is that visuomotor tasks exhibit sequential dependencies between actions at consecutive time steps. Falcon leverages this property to avoid denoising from a standard normal distribution at each decision step. Instead, it starts denoising from partial denoised actions derived from historical information to significantly reduce the denoising steps while incorporating current observations to achieve performance-preserving acceleration of action generation. Importantly, Falcon is a training-free algorithm that can be applied as a plug-in to further improve decision efficiency on top of existing acceleration techniques. We validated Falcon in 46 simulated environments, demonstrating a 2-7x speedup with negligible performance degradation, offering a promising direction for efficient visuomotor policy design.
ColA: Collaborative Adaptation with Gradient Learning
Diao, Enmao, Le, Qi, Wu, Suya, Wang, Xinran, Anwar, Ali, Ding, Jie, Tarokh, Vahid
A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational overheads, especially in Fine-Tuning as a Service (FTaaS) for numerous users. We introduce Collaborative Adaptation (ColA) with Gradient Learning (GL), a parameter-free, model-agnostic fine-tuning approach that decouples the computation of the gradient of hidden representations and parameters. In comparison to PEFT methods, ColA facilitates more cost-effective FTaaS by offloading the computation of the gradient to low-cost devices. We also provide a theoretical analysis of ColA and experimentally demonstrate that ColA can perform on par or better than existing PEFT methods on various benchmarks.
- North America > United States > Minnesota (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Asia > Middle East > Jordan (0.04)
Morphological Classification of Extragalactic Radio Sources Using Gradient Boosting Methods
Darya, Abdollah Masoud, Fernini, Ilias, Vellasco, Marley, Hussain, Abir
The field of radio astronomy is witnessing a boom in the amount of data produced per day due to newly commissioned radio telescopes. One of the most crucial problems in this field is the automatic classification of extragalactic radio sources based on their morphologies. Most recent contributions in the field of morphological classification of extragalactic radio sources have proposed classifiers based on convolutional neural networks. Alternatively, this work proposes gradient boosting machine learning methods accompanied by principal component analysis as data-efficient alternatives to convolutional neural networks. Recent findings have shown the efficacy of gradient boosting methods in outperforming deep learning methods for classification problems with tabular data. The gradient boosting methods considered in this work are based on the XGBoost, LightGBM, and CatBoost implementations. This work also studies the effect of dataset size on classifier performance. A three-class classification problem is considered in this work based on the three main Fanaroff-Riley classes: class 0, class I, and class II, using radio sources from the Best-Heckman sample. All three proposed gradient boosting methods outperformed a state-of-the-art convolutional neural networks-based classifier using less than a quarter of the number of images, with CatBoost having the highest accuracy. This was mainly due to the superior accuracy of gradient boosting methods in classifying Fanaroff-Riley class II sources, with 3$\unicode{x2013}$4% higher recall.
- Asia > Middle East > UAE > Sharjah Emirate > Sharjah (0.05)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Europe > Italy > Lazio (0.04)
SQLi Detection with ML: A data-source perspective
Almost 50 years after the invention of SQL, injection attacks are still top-tier vulnerabilities of today's ICT systems. Consequently, SQLi detection is still an active area of research, where the most recent works incorporate machine learning techniques into the proposed solutions. In this work, we highlight the shortcomings of the previous ML-based results focusing on four aspects: the evaluation methods, the optimization of the model parameters, the distribution of utilized datasets, and the feature selection. Since no single work explored all of these aspects satisfactorily, we fill this gap and provide an in-depth and comprehensive empirical analysis. Moreover, we cross-validate the trained models by using data from other distributions. This aspect of ML models (trained for SQLi detection) was never studied. Yet, the sensitivity of the model's performance to this is crucial for any real-life deployment.
- Europe > Hungary > Budapest > Budapest (0.04)
- North America > United States > Virginia (0.04)
- Asia > India (0.04)
Smooth Mathematical Function from Compact Neural Networks
This is paper for the smooth function approximation by neural networks (NN). Mathematical or physical functions can be replaced by NN models through regression. In this study, we get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression. First, we reinterpret inside of NNs for regression; consequently, we propose a new activation function--integrated sigmoid linear unit (ISLU). Then special charateristics of metadata for regression, which is different from other data like image or sound, is discussed for improving the performance of neural networks. Finally, the one of a simple hierarchical NN that generate models substituting mathematical function is presented, and the new batch concept ``meta-batch" which improves the performance of NN several times more is introduced. The new activation function, meta-batch method, features of numerical data, meta-augmentation with metaparameters, and a structure of NN generating a compact multi-layer perceptron(MLP) are essential in this study.
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)